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f— *^ , Abstract. In this paper we investigate the behavior of iteratively decoded low-density parity- 

r™^ , check codes over the binary erasure channel in the so-called "waterfall region." We show that the 

^». -J ■ performance curves in this region follow a very basic scaling law. We conjecture that essentially the 

same scaling behavior applies in a much more general setting and we provide some empirical evi- 

£13 ' dence to support this conjecture. The scaling law, together with the error floor expressions developed 

^ previously, can be used for fast finite-length optimization. 
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1. Introduction. It is probably fair to say that the asymptotic behavior (as 
the blocklength tends to infinity) of iterative coding systems is reasonably well 
& • understood to date. Much less is known about the finite-length behavior though. 

As usual, the situation is clearest for the binary erasure channel (BEC(e)). 
In this case, the finite-length analysis of the average performance of an ensemble 
boils down to a combinatorial problem. In [6] recursions where given to solve 
£^s ■ this combinatorial problem for some simple regular ensembles. These recursions 

I/-} , were generalized in [21, 25] to deal with irregular ensembles, expurgation and to 

<^> • compute block as well as bit erasure probabilities. Therefore, in principle, by 

>0 solving the corresponding recursions it is possible to determine the average finite- 

ly ■ length performance for any desired ensemble. In practice though this approach 

f- ^ \ runs into computational limitations. Roughly, the complexity of the recursions 

grows by a factor n (the blocklength) for each degree of freedom of the ensemble. 
q \ For reasonable lengths therefore only very simple ensembles can currently be 

analyzed in this way. 

Given the computational complexity of an exact finite-length analysis, it is 
of great interest to find good approximations. Let us consider ensembles whose 
threshold is not determined by the stability condition, see [15]. In this case, the 
finite-length performance curve can be divided into two regions, [20]. The water- 
fall region and the error floor region. In the waterfall region the performance is 
determined by 'large' (linear sized) failures and it improves quickly for decreasing 
erasure probabilities. In the error floor region on the other hand the performance 
is determined by 'small' (sub linear sized) weaknesses in the graph. Fortunately, 
this second region is relatively easy to handle as was demonstrated in [20]. 
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2 Finite-Length Scaling 

In this paper we address the issue of modeling the behavior of large error 
events. Our approach is motivated by a general conjecture stemming from sta- 
tistical physics [8, 18]: If a system, parametrized by lets say e, goes through a 
phase transition at a critical parameter, call it e* (in our case the threshold), then 
it has repeatedly been observed that around this critical parameter there is a very 
specific scaling law. To be more concrete: We are interested in the probability of 
block error as a function of the block length « and the channel parameter e, call 
it Pb(«,£). We know that as n tends to infinity there is a phase transition at e*, 
the iterative decoding threshold. Asymptotically, Pb(«,c) tends to zero for e < e* 
and to one for e > e* . The scaling law refines this basic observation: One expects 
that there exists a non-negative constant v and some non-negative function f(z) 
so that 



lim P B (n,e)=/(z). (1.1) 

s.t. n l l v (e*-e)=z 



In other words, if one plots Pb (n, e) as a function of z = n." (e* — e) then, for 
increasing n these finite-length curves are expected to converge to some function 
/(z). The function f(z) decreases smoothly from 1 to as its argument changes 
from — °° to +°°. This means that all finite-length curves are, to first order, scaled 
versions of some mother curve /(z). It might be helpful to think of the threshold 
e* as the zero order term in a Taylor series. Then the above scaling, if correct, 
represents the first order term. In fact, one can even refine the analysis to include 
higher order terms and write 

P B (n,e)=f(z)+n-« J g(z)+o(n-"), 

where u> is some positive real number and g(z) is the second order correction term. 

Such scaling laws are expected to apply in a wide array of situations in com- 
munications. The following is probably the simplest case in which such a scaling 
law can be proven rigorously. Let 9{{n,r) denote Shannon's random parity-check 
ensemble of codes of length n and rate r. Consider transmission over the BEC(e) 
using a random element of H (n,r) with maximum likelihood (ML) decoding. Let 
H denote a random parity-check matrix, let £ denote the set of erased positions 
and let H^ denote the submatrix of H consisting of the columns of H indexed 
by £ . The ML block decoder will succeed if and only if H. E has rank E := |£ |. 
By definition, H. £ is itself a random binary matrix of dimension E x nr, where 
f := 1 — r. Some thought shows that 



P{rank(// E )=£}-^ 



0, E > nr, 

nf =0 1 (l-2'~'" : ), 0<E<nr. 
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A quick calculation reveals that 
EW M [P B (#,e)] 

nr / \ / £-1 

=L( E y^- E i- 11(1-^ 

£=0 W \ i=0 

= fi (V»£^W 0(1/B))i 



where in the last line we used the fact that r = e* and we defined the ^-function 
as usual by 




Q( z ):=-^jy^ 2 dx. 



2tt 

In words, since the conditional probability of block erasure falls off steeply away 
from the threshold, the scaling law is dominated by the probability that the chan- 
nel behaves atypically and that the number of erasures exceeds ne* = nr. 

In this paper we prove a scaling law for iteratively decoded standard en- 
sembles LDPC(«, X,p) and Poisson ensembles LDPC(n, X,r) when transmission 
takes place over the BEC(e). In the sequel we give a leisurely overview regarding 
the main results. The precise statements can be found in Section[3] Some of the 
background material is summarized in Section|2] 

Assume first that l m j n > 3, i.e., that the minimum left degree is at least three. 
Let G be a random element of the ensemble. Then, as stated more precisely in 
Section^ 

nP B (G,e)}=Q( VT<e *~ e) )+o(l), (1.2) 



where a is a quantity which depends on the ensemble and which is computable 
by a procedure similar to density evolution. This scaling law has a form almost 
identical to (II .21 with a 2 representing a variance. Therefore we dub the procedure 
which leads to the computation of a, covariance evolution. We conjecture that in 
fact the following refined scaling law is valid, 

E[Pb(G, e)] = Q f X £ *- £ A + n -h —L^- 2 ^ + ( B - 1/3) 
V a ) V2ira 2 

= e ( ^-f'- e) Wn (1 . 3) 

2 

where the term j3n~^ represents a shift of the threshold for finite lengths. Again, 
this constant (3 depends on the ensemble and we will show how it can be com- 
puted. 

Figure ^ shows this scaling applied to the LDPC(n,x 2 ,^ 5 ) ensemble which 
will serve as our running example. Note that the above scaling law models the be- 
havior of large error events. A better comparison with equation ( II .31 is therefore 
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obtained by considering expurgated ensembles, see [20]. For l m ; n > 3 the scaling 
(II .31 holds true asymptotically regardless of the expurgation scheme. This follows 
since, as shown in [25], the contribution to the block error probability stemming 
from sublinear-sized weaknesses in the graph decreases like 1 (n 1 ^ 1 '™ 11 ' 2 !). 
This is the probability of having a stopping set formed by a single variable node 
and Llmin/2J check nodes (such a constellation is allowed unless double edges 
are forbidden). 
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Fig. 1: Scaling ofE[j)p C i n x i x s\ [Pb(G, e)] for transmission over BEC(e) and belief prop- 
agation decoding. The threshold for this combination is e* ~ 0.42944, see Table FOl 
The blocklengths/ 'expurgation parameters are n/s = 1024/24, 2048/43, 4096/82 and 
8192/147, respectively. (More precisely, we assume that the ensembles have been ex- 
purgated so that graphs in this ensemble do not contain stopping sets of size s or smaller.) 
The solid curves represent the exact ensemble averages. The dashed curves are com- 
puted according to the refined scaling law stated in Coniecture \3. l\ with scaling parameters 
a = ^0.2498692 + e*(l-e*) and (3 = 0.616045, see TableW2\ 



The situation is somewhat more complicated once A'(0) > 0. In this case the 
block erasure probability consists of two parts: the part which stems from linear- 
sized error events and which scales like jl.3i and a contribution which stems from 
sub-linear sized weaknesses in the graph. The contribution from the latter part 
depends crucially on the expurgation scheme employed and does not necessarily 
vanish as « — > °°. 

In the above discussion we focused on the block erasure probability. The 
equivalent scaling law for the bit erasure probability is a straightforward adapta- 



In the sequel we follow the standard convention to write O(-) to denote an upper bound but we 
write 0(f) to denote the exact behavior (up to constants). 
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tion: If the decoder fails at the critical 2 point then, asymptotically, it incurs a fixed 
bit erasure probability, call it v* (the fractional size of the residual graph). There- 
fore, if we multiply the above expressions by v* we get the corresponding scaling 
law for the bit erasure probability. 3 Figure [2] shows the resulting approximation 
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Fig. 2: Scaling of¥, LD p C i n x i x s\ [Pj,(G, e)] for transmission over BEC(e) and belief prop- 
agation decoding. The threshold for this combination is e* ~ 0.42944, see Table \4.2\ The 
blocklengths/expurgation parameters are n/s = 1024/24, 2048/43 and 4096/82, respec- 
tively. The solid curves represent the exact ensemble averages. The dashed curves are 
computed according to the refined scaling law stated in Conjecture \3.1\ with scaling pa- 
rameters a = \/0.249869 2 + e* ( 1 - e* ) and (5 = 0.616045, see TableUH 



The basic form of the scaling law applies to regular as well as irregular en- 
sembles. 4 The computation of the scaling parameters though becomes signifi- 
cantly more involved in the irregular case and therefore we limit ourselves in this 
paper to providing the detailed calculations only for regular ensembles. Fig. [3] 
demonstrates the scaling law for the block erasure probability applied to the irreg- 
ular ensemble LDPC(«, A = Ix+4jc 3 ,/5 = x 5 ). In this case the scaling parameters 
were simply fitted to the data. 

The performance of ensembles whose threshold is determined by the sta- 
bility condition scales in a fundamentally different way. The simplest such rep- 
resentatives are cycle codes. We will discuss cycle codes in some detail since 



2 See Sectionl2lfor a discussion of this notion. 

3 The approximation can be improved away from the threshold by multiplying the above expres- 
sion with the typical size of the failure for that particular e. 

This is true as long as the threshold is not determined by the stability condition and is determined 
by a single critical point, see Sectionsl2landl3l 
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Fig. 3: Scaling °fK LDPC , n a=±x+-x 3 =v- 5 '|P B (^> e )] f or transmission over BEC(e) and 
belief propagation decoding. The threshold for this combination is e* ~ 0.48281. The 
blocklengths/ 'expurgation parameters are n/s = 350/14, 700/23 and 1225/35. The solid 
curves represent the simulated ensemble averages. The dashed curves are computed 
according to the refined scaling law stated in Conjecture \i.l\ with scaling parameters 
a = \J 0.276 2 + e* ( 1 — e* ) and j3 = 0.642274. These parameters were fitted to the data. 



we conjecture that the same scaling applies to all ensembles for which the sta- 
bility condition determines the threshold. Fig. |4] shows block erasure curves for 
the LDPC(«,jc, r = A) cycle Poisson ensemble with expurgation parameter s = 1 
for n = 2', i = 8, 10, 12, 14. Also shown is the limiting block erasure probability 
curves and our approximation for the block error probability around the thresh- 
old. Clearly, these curves differ in their nature significantly from the curves dis- 
cussed before. As investigated in more detail in Section|5J the block erasure prob- 
ability does not show a threshold effect: instead it converges to a smooth limiting 
curve. Around the threshold we have the following scaling law, 

E L DPC( w) [PB(G,e)] = 1 -A««- 1 /6/(fo„ 1 /3 (e _ e * )) {i + o(„-i/3)J , 

(1.4) 

where a = r~ 1 ' 6 , b = r~ 2 ' 3 and A is a constant which depends on the expurgation 
scheme used. The form of the mother curve f(x) is given in Lemma l3T2l 

1.1. Scaling for General Channels. In many ways this paper only repre- 
sents the very first step in what seems to be a promising research direction. The 
most important extension is undoubtedly the one to general binary-input output- 
symmetric channels. Although there is currently little hope of attacking this prob- 
lem rigorously, empirically such a scaling seems to be true for general channels as 
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Fig. 4: Scaling ofE LDpc(n 



[P#(G, t)]for transmission over the BEC{e) and belief 



propagation decoding. The (bit) threshold for this combination is e* = ^. The solid curves 
are the exact ensemble averages for blocklengths equal to n = 256, 1024, 4096 and 16384. 
The bold curve is the limiting (in n) block erasure curve. The dashed curves are the finite- 
length approximations computed according to equation U.4\ . 



well. In principle any (function of the) channel parameter can be used for stating 
the scaling law, however we make this choice slightly less arbitrary by the follow- 
ing convention. Consider a family of binary-input output-symmetric memoryless 
channels parametrized by lets say a. Let C(a) denote the capacity for the parame- 
ter a. The role of e* — e in the case of theBEC(e) is then played by C(<t)~C(<j*), 
i.e., we use the scaling law 



Pb = Q 



Vn(C(cr)-C(c*)-/&r3) 



o 



(1.5) 



Note that for the BEC(e), C(e) = 1 — e, so that this choice is consistent with our 
previous convention. The parameters a and /3 reported in the captions of Figs. [5] 
to[7]are defined according to the above formula. 



Fig.[5]shows performance curves for the LDPC(«, A 



2 

: X , p ■■ 



: x 5 ) ensemble 



transmitted over the binary-input additive white Gaussian noise (BAWGN) chan- 
nel and a quantized version of belief propagation. Fig.[6]shows the corresponding 
curves for the same ensemble when transmission takes place over the binary sym- 
metric channel (BSC) and belief propagation decoding is used. Finally, Fig. 
shows the performance curve for the Gallager algorithm A. Although these cases 
are quite distinct one can see that the empirically fitted scaling laws are in excel- 
lent agreement with the exact curves. 
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Fig. 5: Scaling of¥, LD p C i n x i x 5 \ [Pfi (G, <r)] for transmission over BAWGNC(a) and a quan- 
tized version of belief propagation decoding implemented in hardware. The threshold for 
this combination is (E b /No)* dB « 1.19658. The blocklengths n are n = 1000, 2000, 4000, 
8000, 16000 and 32000, respectively. The solid curves represent the simulated ensem- 
ble averages. The dashed curves are computed according to the refined scaling law il.3\ 
with scaling parameters a = 0.8694 and (3 = 5.884. These parameters were fitted to the 
empirical data. 



1.2. Applications of Scaling to Finite-Length Optimization. An impor- 
tant application of the scaling laws which is left for future work is finite-length 
optimization. Combined with analytic expressions of the contribution to the error 
probability stemming from small (sublinear sized) weaknesses of the graph, the 
scaling laws can be used as an approximation to the performance for finite-length 
ensembles. Note also that from the limited examples exhibited in this paper it 
appears that the scaling parameters depend only weakly on the degree distribu- 
tion. This suggest that a good optimization strategy for finite-length ensembles is 
to optimize the infinity threshold under the condition that the contribution of the 
error floor leads to acceptable overall performance. 

1.3. Connected Work and Outline. In [13] an approach to analyze the 
finite-length behavior of turbo-codes was introduced. This method, which the 
author call the "Exit band chart", is used to describe the probabilistic conver- 
gence of the iterative decoding algorithm and provides an approximation of the 
BER in the waterfall region. Somewhat related is also the work by Zemor and 
Cohen who study in [24] the "threshold" behavior of general classes of codes. 
A preliminary numerical investigation of the scaling dl.2t was presented in [17]. 
Partial accounts of the present work appeared in [2, 3]. 

In Section |2] we introduce the necessary notation and review some of the 
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Fig. 6: Scaling of E. LDPC r n x i x 5\[Pb(G,£)] for transmission over BSC(e) and belief 
propagation decoding. The threshold for this combination is e* ~ 0.084. The block- 
lengths/expurgation parameters are n/s = 1024/19, 2048/39, 4096/79 and 8192/79, re- 
spectively. The solid curves represent the ensemble averages obtained via simulation. The 
dashed curves are computed according to the refined scaling law stated in equation \1.3\ 
with scaling parameters a = 1.156 and /3 = 0. 1. 



background material, in particular the density evolution analysis as introduced by 
Luby et. al. in [15]. In Section[3]we state and prove the general form of the 
scaling laws. In Section|4]we then discuss for regular ensembles how the scaling 
parameters can be computed. In section [5] we discuss in detail the refined scal- 
ing law and how the shift parameter can be computed. Some of the background 
material and some detailed calculations have been relegated to Appendices. 

2. Review. In this section we recall some basic facts on the density evolution 
analysis of low-density parity-check (LDPC) codes under iterative decoding. We 
also fix some of the notation to be used throughout the paper. 

2.1. Ensembles and Channel Models. In this paper we consider both stan- 
dard as well as Poisson low-density parity-check ensembles. Standard ensembles 
are denoted in the usual way as LDPC(«, A, p), where n is the block length and A 
and p denote the degree distributions from an edge perspective, see [15]. For the 
Poisson ensemble the right degree distribution is Poisson. More precisely, given 
the left degree distribution A and the rate r, the right degree distribution tends to 

p{x) = e^T* as n — > °°. We will denote such an ensemble by LDPC(«,A,r). To 
sample from the Poisson ensemble pick a bipartite graph with n variable nodes 
and the proper variable node degree distribution. Connect each edge emanating 
from a variable node to one of the nf check nodes, where the choice is taken 
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Fig. 7: Scaling of K LDPC i n ^ _s\[Pb(G, e)] for transmission over BSC(e) and Gallager 
Algorithm A decoding. The threshold for this combination is e* ~ 0.03946. The block- 
lengths/expurgation parameters are n/s = 512/50, 1024/70, 2048/100 and 4096/200, 
respectively. The solid curves represent the ensemble averages obtained via simulation. 
The dashed curves are computed according to the refined scaling law stated in equation 
il.Jt with scaling parameters a = 1.11 and j3 = 0.0. 



according to a uniform probability distribution. 

From time to time it is more convenient to describe the degree distributions 
from a node perspective. Our notation for the left and right node degree distribu- 
tions are A and P respectively and we have the following important relationships. 

A(l)=/>(1) = 1; A(l)=n,P(l)=nf. 

It will sometimes be necessary to consider expurgated ensembles. Although 
there are many expurgation mechanisms possible, we will limit our discussion 
to the following simple scheme. Consider e.g. the case of expurgated Poisson 
ensembles. Define ELDPC(«, A, r,s) as the subset of all elements in LDPC(«, A, r) 
whose minimum stopping set size is at least s+ 1 . As always, endow this set with 
the uniform probability distribution. E.g., ELDPC(n, A,r,2) denotes the Poisson 
ensemble which contains no stopping sets of size one or two. The same notational 
convention is used for expurgated standard ensembles. 

We will consider two channel models. The more familiar one is the binary 
erasure channel with parameter e, denoted by BEC(e), where each bit is erased 
independently with probability e. Sometimes though it is more convenient to 
consider the model BEC(n, ne), the channel model in which exactly tie out of all n 
bits are erased and where the set of these ne erased bits is chosen uniformly from 
all (") such choices. 
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We consider scaling laws for both bit as well as block erasure probabilities 
and we will always consider ensemble averages. E.g., in its full notational glory, 



"ELDPC (n , A (*) =je, r= j ,.r= 1 



[P B (G,ne)] 



will denote the expected block erasure probability for cycle Poisson ensembles 
of rate one-half containing no double edges when transmitted over the channel 
BEC(«,«e). Because of the obvious notational burden we will often replace this 
with shorthands and we might write e.g., 

P B («,A(x) =x,r= -j,s = l,ne). 

We might even omit some of the parameters if they are clear from the context. 

2.2. Decoding. There are essentially two alternative ways of defining the 
decoding algorithm for the BEC(e). Although they are equivalent in performance 
they are quite different from the point of view of analysis. First, we can think of 
the standard message passing decoder in which messages are passed in parallel 
from left to right and then back from right to left until the codeword has been 
decoded or no further progress is achieved, [12]. Alternatively one can think of 
the decoder as a process which tries to determine one bit at a time in a greedy 
fashion. This is the point of view introduced by Luby et al. in [14, 15] and we 
will adopt it in this paper. More precisely, the decoder proceeds as follows. Given 
the received message, the decoder passes all known values on to the check node 
side. These values are accumulated at the check nodes and this partial metric 
is stored. Further, all known nodes and edges over which messages have been 
passed are deleted. In this way one arrives at a residual graph which has a certain 
degree distribution. The decoder proceeds now in an iterative fashion. If the 
residual graph contains no degree-one check nodes the decoding process stops. 
Otherwise, the decoder randomly choses one such degree-one check node and 
passes its partial metric to the connected variable node. This variable node is now 
known. Its value is communicated to all connected check nodes, where the value 
is accumulated to the partial metric. The involved variable node, check node and 
all involved edges are deleted. In this way a new residual graph results and a new 
iteration starts. 

2.3. Density Evolution. The advantage of the second description lies in the 
fact that the decoding process is seen as a stochastic process with small increments 
- at each iteration the change of the degree distribution is a random variable and 
this change is small. By standard arguments one can show that in the large block- 
length limit the behavior of individual instances follows with high probability the 
expected such behavior and this expected behavior can be expressed as the solu- 
tion of a differential equation. This is the idea introduced in [15]. 

First recall that by definition of the ensemble the degree distribution of the 
residual graph constitutes a sufficient statistics, i.e., given this degree distribution 
all residual graphs which are compatible with this degree distribution (and are 



12 Finite-Length Scaling 

compatible with the general description of the ensemble, like, e.g., the degree of 
expurgation) are equally likely. Therefore, in order to analyze the behavior of the 
decoder it suffices to analyze the evolution of this degree distribution. Let us now 
recall the solution of the infinite length analysis given in [15] since it forms the 
starting point for our investigation. Let xi denote the fraction of erasure messages 
entering the variable nodes at a given point in time (here the / stands for right-to- 
left message). In terms of this parametrization, the evolution of the system (i.e., 
the evolution of the degree distribution of the residual graph) is given by 

Li (xi) = eAjx], / > 2, 
R (xi)=P(l)-]TRj(x l ), 

R l (xi)=A'(l)e\(xi)[xi-l + p(l-eX(xi))], (2.1) 

Ri (xi) = £ Pj ( J ) (eA (*,))' (1 - eA {xi)) M , / > 2. (2.2) 

;>2 W 

Hereby, L,(x/) (Rj(xi)) denotes the expected number of variable (check) nodes 
of degree i at state x\. In the sequel we will refer to these equations as density 
evolution equations. Rather than considering the evolution of the whole degree 
distribution it suffices often to look at some smaller set of parameters. As we 
have discussed, the most important parameter in the decoding process is the num- 
ber of degree-one check nodes, denote it by s (x{) :=R\(x{). Further important 
parameters are the size of the residual graph, v(x{) := L, £((*/) and the num- 
ber of check nodes of degree at least two, t (x{) := L/>2^/ (•*/)• Let v(xi), a (xi) 
and r(xi) denote the respective fractions, v(xi) = A(l)v(xi), s(xi) = A(\)a(xi), 
t(xi)=A(l)r(xi). 

Example 1. [Density Evolution of LDPC(«,x 2 ,;c 5 )-Ensemble] Fig.|8]de- 
picts the evolution of a (dashed line) and r (solid line) as a function of v for 
the ensemble LDPC(n,jc 2 ,x 5 ) for the choice e = e* « 0.4294. Note that for this 
choice of e the expected number of check nodes of degree one reaches zero at 
some critical time of the decoding process. □ 

The density evolution equations completely specify the asymptotic behavior 
of the decoder. Recall that the decoder stops if the number of degree-one check 
nodes has reached zero. If this point is reached before the size of the residual 
graph has reached zero a decoding error occurs. Therefore, if we plot a(x) as 
a function of x for a given channel parameter e we know that the decoder will 
succeed with high probability if and only if a(x) > for all x £ (0,1]. From 
equation ( 12. It we see that a(x) > for x G (0, 1] is equivalent to 

p{\ - e\{x)) > 1 -x, Vx e (0, 1]. (2.3) 

We can therefore define the threshold e*(X,p) as 

e*(A,p) := sup{e : e G [0, l],p(l - e\(x)) > 1 -x,\/x e (0, 1]}. 
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Fig. 8: The evolution of a and r as a function of v for the LDPC(n,x ,x ) ensemble and 
e = e* ~ 0.4294. At v = u* ~ 0.203, o-(v) has a minimum and touches the v-axis. 



We say that x* is a critical point if a(x) reaches a minimum at x = x* and if this 
minimum is zero, i.e., if 



p(l-e*A(x*)) = l 



To simplify our matters, we will only discuss ensembles that have a sin- 
gle critical point. The extension to several critical points poses no problems in 
principle but is technically more cumbersome. All regular ensembles have this 
property. We say that a degree distribution is unconditionally stable if x* > 0, i.e., 
if the threshold is not determined by the stability condition. It is easy to check 
that this is the case for all regular ensembles with l m ; n > 3. Otherwise, i.e., if 
x* = we say that the ensemble is marginally stable. The typical example are 
cycle code ensembles. As we will see, the nature of this scaling is drastically 
different for the two cases. Finally, we will assume that the degree distributions 
X(x) and p(x) (or just X(x) for Poisson ensembles) are polynomials. In this case 
the density evolution equations have only a finite number of minima and maxima. 
This is a purely technical condition to avoid some pathological cases which are of 
no practical interest. 

3. Main Results and Discussion. The following statements apply both to 
standard ensembles and Poisson ensembles. Generically we will denote such an 

x-\ 

ensemble by LDPC(n, A,p). In the Poisson case we can think of p{x) = e^. 

3.1. Unconditionally Stable Ensembles. The basic scaling law as given in 
dl.2t is stated more precisely in the following. 
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Lemma 3.1. [Scaling of Unconditionally Stable Ensembles] Consider trans- 
mission over the BEC(e) using random elements from an ensemble LDPC(«, A, p) 
which has a single critical point and is unconditionally stable. Let e* ~ e*(X,p) 
denote the threshold and let v* denote the fractional size of the residual graph 
at the critical point corresponding to the threshold. Fix z to be z := \/n(e* — e). 
Let Pb(«, A, p, e) denote the expected bit erasure probability and let Pb. 7 («, A, p, e) 
denote the expected block erasure probability due to errors of size at least -fis*, 
where 7 € (0,1). Then as n tends to infinity, 

P B , 7 (n,A,p ! e) = e(i)(l+ 0n (l)), 



P b (n,A,p,e) = v*Q(^) (1+01.(1)), 



where a = a (A, p) is a constant which depends on the ensemble. 

Proof. First note that if A'(0) = 0, i.e., if there are no degree-two variable 
nodes, then the block erasure probability is dominated over the whole range of e 
by large error events (when n tends to infinity). This means that Pb, 7 is equal to 
the ordinary block error probability. 

This is no longer true once A'(0) > 0. If < A'(0)//(1) < 1 then the ensemble 
can be expurgated in order to eliminate small (sublinear weaknesses in the graph) 
and the above scaling law will then account for all errors. If one the other hand no 
such expurgation is done or if A'(0)p'(l) > 1, then besides the contribution to Pb 
stemming from large error events also the contribution stemming from sublinear- 
sized weaknesses in the graph will be non-negligible. The above scaling law only 
applies to the first contribution. The bit erasure probability is not affected by these 
considerations since the contribution of sublinear-sized stopping sets in the graph 
vanishes as n-tends to infinity. Fortunately, the effect of sublinear-sized stopping 
sets is relatively easy to assess by union bounding techniques. The total erasure 
probability can be represented as the sum of these two contributions. For a more 
detailed discussion we refer the reader to [7, 19, 25]. 

Our approach will be to consider first a situation slightly simplified with 
respect to the one encountered in iterative decoding. This will be done in Section 
0](see Proposition l4.H and Appendix 1X1 The basic tools needed for the proof of 
this lemma will be introduced in such a simplified context. It turns out that the 
main conclusions hold true when the simplifying assumptions are removed. This 
will be shown in Appendix|S] D 

We conjecture that in fact the following refined scaling law is valid. 

Conjecture 3.1. [Refined Scaling of Unconditionally Stable Ensembles] 
Consider transmission over the BEC(e) using random elements from an ensemble 
LDPC(rc, X,p) which has a single critical point and is unconditionally stable. Let 
e* = e*(X,p) denote the threshold and let v* denote the fractional size of the 
residual graph at the threshold. Let P D («, X,p,e) denote the expected bit erasure 
probability and let Pb, 7 («, A,p,e) denote the expected block erasure probability 
due to errors of size at least 7^*, where 7 G (0, 1). Fix z to be z := \fn(e* — 
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2 

/3«~3 —e). Then as n tends to infinity, 

P B , 7 («,A,p,e) = e(^)(l + 0(«- 1 / 3 ), 
P h (n,\,p,e) = v*Q(-)(l+0(n-^) 



where a — a(X, p) and (3 = /3(X, p) are constants which depend on the ensemble. 

This conjecture can be proven in the simplified context mentioned above 
(and defined in Section|4}. This is done in Sec. [5] At the end of the same section, 
we provide some heuristic argument suggesting that the simplifying assumptions 
are in fact irrelevant. 

In the remainder of this section we provide an informal (albeit essentially 
correct) justification of the above scaling forms. The question of how to compute 
the scaling parameters will be deferred to Sections |4] (for the variance a 2 ) and|5] 
(for the shift /?). 

Consider the behavior of the individual trajectories of the decoding process 
for particular choices of the graph and the channel realization. We will see that 
these trajectories closely follow the expected value (given by the density evolution 
equations) and that their standard deviation is of order y/n, Consider now the 
decoding process and assume that the channel parameter e is close to e*. If e = e* 
then at the critical point the expected number of degree-one check nodes is zero. 
Assume now that we vary e slightly. From the density evolution equation ( 12. 1> we 
see that the expected change in the fraction of degree-one check nodes (a = sjri) 
at the critical point is 



da 

dc 



= -47?k A <>*)V(l-e*A(**)). (3.1) 

x=x*;e=e* yl \ L ) 



If we vary e so that Ac is of order 0(1), then we conclude from (13- II that the 
expected number of degree-one check nodes at the critical point is of order 0(«). 
Since the standard deviation is of order 0(^/n), then with high probability the 
decoding process will either succeed (if (e — e*) < 0) or die (if (e — e*) > 0). The 
interesting scaling happens if we choose our variation of e in such a way that 
Ae = z/y/n, where z is a constant. In this case the expected gap at the critical 
point scales in the same way as the standard deviation and one would expect that 
the probability of error stays constant. Varying now the constant z will give rise 
to the scaling function f(z), cf. equation (ll.lt . 

We will further see that the distribution of states at any time before hitting the 
s — plane is Gaussian and that the evolution of its co variance matrix is governed 
by a set of differential equations in the same way as the mean. We will therefore 
call these equations the covariance evolution equations. As an example, consider 
the ensemble LDPC(«,x 2 ,x 5 ) and transmission over the channel BEC(«,ne). In 
this case the residual graph at the start of the decoding process has exactly ne 
variable nodes and since at each step of the decoding process exactly one variable 
node is pealed off, the size of the residual graph after the £-th decoding step is 
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exactly ne — I (assuming the decoder has not stopped prematurely). As we will 
discuss in more detail in Section [4] it suffices in this case to keep track of the 
tuple (s,t) (i.e., we do not need to keep track of the whole degree distribution 
of the residual graph). Fig. [9] shows the evolution of (s,t) as a function of the 
size of the residual graph for the choice e = e*. The solid line corresponds to 
the density evolution equation (albeit now in three-dimensional form). The dot 
indicates the critical point. The ellipsoids represent the covariance matrix. More 
precisely, they represent contours of constant probability. Note that this picture is 
slightly misleading. The ellipsoids really live on a scale of y/n whereas the rest of 
the graph is scaled by n, i.e., for increasing length the ellipsoids will concentrate 
more and more around the expected value. Those trajectories that hit the s = 




Fig. 9: A pictorial representation of density and covariance evolution for the 
LDPC(n,x~ ,jr). Notice that the ellipsoids corresponding to (s,t) covariances should be 
regarded as living on a smaller (by a factor ^fn) scale than the typical trajectory. 



plane die. This corresponds to the part of the ellipsoids that vanish. 

One can quantify the probability for the process to hit the s = plane as 
follows. Stop density and covariance evolution when the number of variables 
reaches the critical value v*. At this point the probability distribution of the state 
is well approximated by a Gaussian with a given mean and covariance for s > 
(while it is obviously for s < 0). Estimate the survival probability (i.e. the 
probability of not hitting the s = plane at any time) by summing the Gaussian 
distribution over s > 0. Obviously this integral can be expressed in terms of a 
g-function. 

We will see that the above description leads indeed to the scaling behavior 
as stated in Lemma IXT1 Where does the shift in Conjecture 13. II come from? It 
is easy to understand that we were a bit optimistic (i.e., we underestimated the 
error probability) in the above calculation: We correctly excluded from the sum 
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the part of the Gaussian distribution lying in the s < half-space - trajectories 
contributing to this part must have hit the s = plane at some point in the past. 
On the other hand, we cannot be certain that trajectories such that s > when v 
crosses v* didn't hit the s — plane at some time in the past and bounced back (or 
will not hit it at some later point). We refer to Section|5]for an in-depth discussion 
on how to estimate this effect. 

Let us finally recall that the performance over the BEC(e) channel can be 
easily derived from the results obtained using the model BEC(«,«e). One can 
derive the erasure probability for the first case by summing the conditional erasure 
probability, where the conditioning is on the number of erasures. Notice that the 
number of erasures for the BEC(e) is asymptotically Gaussian with mean ne and 
standard deviation \fne~l. Since this standard deviation is of the same order as 
the gap to the threshold such a convolution gives a non trivial contribution, unlike 
in the Shannon ensemble example, cf. Section Q] It is easy to verify that this 
convolution amounts to computing the parameter a 2 , cf. Lemma I3TT1 as the sum 
of two contributions: one due to the channel fluctuations and the other due to 
covariance evolution. More precisely we have 

a BEC(e) = a BEC(n,ne) + e *^ » ( 3 - 2 ) 

where we took e = e* since we are interested in the region e = e* + 0(n -1 ' 2 ) 
and we can neglect 0(n~ 1 ' 2 ) corrections. Hereafter we shall mostly focus on the 
BEC(«,«e) channel. The reader is invited to use the formula ( 13. 2\ for translating 
the results whenever necessary. 

3.2. Marginally Stable Ensembles. As already mentioned, marginally sta- 
ble ensembles are expected to follow a different scaling from the one described 
in Lemma lTTI We will limit our discussion to the simplest case, namely the case 
of cycle code ensembles. We conjecture though that the form of the scaling law 
is quite general and applies to all marginally stable ensembles. The cycle Poisson 
ensemble is slightly easier to handle analytically than the standard ensemble. We 
will therefore formulate our results mainly for this case. 

Lemma 3.2. [Scaling of Block Probability for Cycle Poisson Ensembles] 
Consider transmission over BEC(n,«e) using elements from ELDPC(«, A(jc) = 
x,r,s). Then 



P B (n,A(x) =x,r,s,ne) = 1 -A(s)a n - s/b f(bn l ' J (e- e*)) f 1 +0(n 

where a = r" 1 / 6 , b = r" 2 / 3 , A(s) = exp{£J, =1 57}, and 

f( X ) = ^f^e-^ P (3 2 ^;3/2,-l). 
Hereby, p(u;a,f3) is a so called stable density with representation 
p(u;a,l3) = -!- /V ta eJEp{-|f| a «-'S*W /,8ig,I W} dt, 
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and K(a) = l-|l-a|. 

Proof. In principle one could arrive at the above result by proceeding in the 
same fashion as for unconditionally stable ensembles, i.e., one could employ the 
tools of density evolution and covariance evolution. 

We will however use an entirely different approach. Note that there is a 
one-to-one correspondence between elements of ELD PC (n, A (x) =x,r,s = 2) and 
random graphs on nr nodes with exactly n edges, see [20]. If s — 2, then double 
edges and cycles of length four are excluded from the Tanner graph. Therefore, 
each variable node connects two distinct check nodes and no two variable nodes 
connect the same pair. If we therefore identify each variable node (and the two 
edges that emanate from it) with one edge in an ordinary graph we get our desired 
correspondence. Further, the decoder will be successful if and only if this random 
graph is a forest, i.e., a collection of trees. Let F(l,k) denote the number of 
forests on I labeled nodes and k components. Such a forest has / — k edges and 
therefore it corresponds to a constellation on v = I — k variable nodes. Since these 
variable nodes can be ordered arbitrarily it follows that there are v\F(nr,nr — v) 
constellations on v variable nodes which do not contain stopping sets. 

It remains to find the total number of constellations on v variable nodes which 
are compatible with the expurgation scheme. The desired result will then follow 
by diving these two quantities. Assume s = 0. Then the total number of constel- 
lations on v variable nodes is equal to (nr) 2v , since for each edge we can choose 
one of the nr check nodes. Let n s (G) denote the number of cycles of length 2s in 
a fixed portion of the bipartite graph G of size v. It is easy to verify (and is a well 
studied problem in random graphs) that E[n,.(G)] = ^ (^) 5 (1 +0(l/v)). Further 
it is known that for each fixed s the random variables (n\ , • • • n s ) are asymptotically 
(as n and v tend to infinity with a fixed ratio) independent and follow a Poisson 
distribution, [4]. Finally, for the Poisson ensemble we have e* = j so that around 
the critical value v = e*n = '4f and ^ = 1. It follows that around the threshold the 

1 nr 

total number of constellations which are compatible with the expurgation scheme 
behaves like 

T(v~ne*) = (nr) 2v e _E :>'=i27(l + 0(l/v)) = (nr) 2v /A(»(l + 0(l/v)). 

From this the block error probability around the threshold follows immediately 
once F(l,k) is known, namely, we have 



P b (h,A(x) = x,r,s,ne~ne*) = 1 -A(s) V '- /_ ' >-(l +0(l/n)) . 



(ne)\F(nr,nr— ne) 

~{nff 



One of the most celebrated formulas in enumerative combinatorics states that 
there are l l ~ 2 labeled trees on / nodes, [23]. Unfortunately there does not seem 
to exist an equally elementary expression for the number of labeled forests. The 
situation is aggravated by the fact that we are interested in the region where the 
average number of edges per node is around one. Exactly around this region the 
graph goes through a phase transition and so the behavior of F(l,k) is nontrivial 
even in the limit of large sizes. Fortunately, the asymptotic behavior has been 
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determined by Britkov [5] and the result has been made accessible (to the En- 
glish speaking audience) in the book by Kolchin [11]. Our result now follows by 
employing the asymptotic approximation stated in Theorem 1.4.4 in [ll]. 5 □ 

Note, that for the cycle case the maximum likelihood and the iterative de- 
coder perform identical in terms of block erasure probability. This is true since in 
this case the condition of no stopping sets is equal to the condition that there are 
no cycles which in turns implies that there is no codeword. Note, however, that 
this is no longer true once we look at the resulting bit erasure probability. 

We also note that if we want to get the scaling law for the channel BEC(e) 
we need to convolve the above curves with the Binomial with mean ne. However, 
on the scale e* — e = <9(n~'' 3 ), the effect of the channel fluctuations vanishes in 
the large blocklength limit. The leading correction to the scaling law ( 13.31 coming 
from the channel consists in the substitution 

f{x) - f(x) + ^S rW»'' /3 +0{n- 1 ' 2 ) . (3.3) 

The following lemma characterizes the corresponding limiting block erasure 
probability curve. 

Lemma 3.3. [Asymptotic Block Erasure Probability Curve] Consider trans- 
mission overBEC(«,«e) orBEC(e) using random elements fromELDPC(«, X(x) =| 
x,r,s). Then 



lim Pb(«,A(x) 



The corresponding asymptotic bit erasure probability curve under iterative decod- 
ing can be obtained through a standard density evolution analysis and it is given 
in parametric form by 

x xA(\-p(\-x)) 




\{\-p(\-x)y \{\- P {\-x)) 

where x € (x*, 1] and x* is the solution to the equation e*A(l — p{\ —x)) = x. 
Figure [Tolshows the resulting bit and block erasure curves for ELDPC(«, X(x) = 
x,r= j,s= 1). 

Cycle codes can not be expurgated up to some linear fraction of the block 
length since the number of stopping sets of size s\ , ■ ■ ■ s^ are jointly Poisson and 
have mean equal to (2/r) Si /(2si), respectively. Below the threshold e* = r/2, the 
bit erasure probability scales as 1 jn. Expurgation changes uniquely the coefficient 
of this scaling. A simple calculation yields 

P b («,A(x) =x,r,s,ne) = ^L s (jj (l + 0(l/»)), (3.4) 



The reader is warned that there is a slight typo in Theorem 1.4.4 as stated in [11]. 
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Fig. 10: The bit and block erasure probability for ELDPC(n, X(x) =x,r = j,s = I) for 
n = 2', i = 8, 10, 12, 14. As can be seen from the picture, the block erasure curves actually 
converge to a limiting (non-zero) curve over the whole range of e, whereas the bit erasure 
curves decrease to zero below the threshold for increasing block lengths. Also shown are 
the result of using the scaling laws for the block erasure probability as stated in Lemma 



m 



where we defined the function 



*.«:= L y = -tog(l-*)-£^-. 

s'=s+l ,v'=l 



As shown in Fig. [^ this formula provides a good approximation to the bit error 
probability away from the critical region. Notice in fact that the coefficient of the 
\jn term in Eq. (13.4-t diverges as e — > e*. 



Pb| 




Fig. 11: Comparison of the exact bit erasure curves (solid line) with the analytic expres- 
sion given in \3.4\ (dashed lines) for n = 2', i = 8, 10, 12, 14 and e < e*. 



4. Computation of the Variance Parameter. In the previous section we 
saw that the basic scaling law, cf. Lemma I3TT1 only depends on the variance a 2 . 
In this section we will work out in detail the calculation of this parameter. In 
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Section|5]we will present the method to be used for computing j3 which is needed 
for the conjectured refined form of the scaling law. 

Although conceptually it is straightforward to write down the equations for 
the general irregular case, the actual computations are quite cumbersome. We will 
therefore proceed as follows. In Section l4~Tl we discuss the covariance evolution 
equations in an abstract setting. These are applied to particular regular LDPC 
ensembles in Section l4~2l 

4.1. General Covariance Evolution. We regard iterative decoding as a Markov| 
process in a finite dimensional space. The examples in the next two subsections 
will make clear how this framework can be adapted to particular code ensembles. 

Consider a family of Markov chains X n Q t X n i, . . . ,X nt , . . . parametrized by 
n G N and taking values in Z d+l . For iterative decoding applications, n will rep- 
resent the blocklength. We drop the subscript n hereafter. Let the transition prob- 
ability be 

P{X,+ 1 = x \X,=x)=W{x'-x\x), (4.1) 

and the initial condition be a single non-random state Xq = xq G lf i+ ' . In iterative 
decoding the initial condition is actually a distribution over states. This case is 
easy to treat by first conditioning on the initial state, and then convolving with the 
initial distribution. We will denote the d + 1 coordinates of the state x as 

( X (°\x( 1 \...,xW)=x€Z d+1 . (4.2) 

We denote the corresponding random variable by (X (°> , X ( ' ',. . . , X ( d > ) . 

In the following we shall always be interested in times t < kqh for a positive 
constant kq (we reserve the symbols ki, K2, ■ ■ ■ for numerical constants which we 
assume not to depend upon n). We shall moreover assume the following regularity 
properties of the Markov chain: 

1. The chain makes finite jumps. In other words, there exists a n\ > such 
that |x/|\ -x} l) \ < k\ almost surely. 

2. The transition probabilities have a smooth n — >> °o limit. In practice there 
exist functions W : Z d+l x R^ +1 — > M + and a positive constant K2 such 
that 

\W(A\x)-W(A\x/n)\ <K 2 /n. (4.3) 

Clearly, we have Y,a W(A\x/n) — 1. We shall moreover assume W(Z\|z) 
to be C 2 (M. d+1 ) with respect to its second argument and to have bounded 
first and second derivatives. 

3. The process has a finite range on the n scale. In practice, there exists 
K3 > such that |X, | < n^n almost surely. 

Under these hypothesis the distribution of X t is well described by a Gaus- 
sian whose mean and variance can be obtained by solving some ordinary dif- 
ferential equations. In order to state this fact in a more precise fashion, we 
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need some additional notation. We denote by X t = E[X t ] the average of X t and 

D { t ij) = E[X t {i) ;X t U] } = E[X t {i) X t U) ] - E[X, {i) ]E[X t U) ] its covariance. We need fur- 
thermore the first two moments of the transition rates W(Z\|x): 



A 

fM(x) = Y,AAjW(A\x)-fU(x)fW(x) 



(4.4) 
(4.5) 



with i,j G {0, . . . ,d}. We shall call p''(z), p l '\z) the analogous quantities for 
the limiting rates W(^d|z). 

Finally, let z(t) G R (/+1 and 5^\t) G R, for r G M+ and ij G {0, . . . ,d}, 
denote the solution of 



dr 

dS™ 
dr 



(t)=P(I(t)), 



(r)=f {iJ Hz(r))+l j 

k=0 



S iik) (T) 



d fU) 



d z (k) 



dfU 



;(t 



3z« 



6 {kj \i 



(4.6) 
(4.7) 



with initial conditions z(0) = Xo/n and <5 ( -'' / '' > (0) = 0. 

PROPOSITION 4. 1 . Under the conditions stated above the following results 
hold (here we use the symbols Qq, i?i , . . ., for constants (independent ofn) which 
we prove to exist): 

I. X t concentrates on the n scale. In formulae, there exist i?o > 0, such that 



V{\xi i] -X { p\> P }<2e &&. 



(4.8) 



//. The average and covariance ofX t are accurately tracked by z(j) and 
$('-"(t). More precisely, there exist constants J?i,i?2 > 0, such that 



-Di iJ) -S^(t/n) 



< 



Q 



< 



n 
Jn' 



(4.9) 
(4.10) 



///. The variable (X, —X t )/^/n converges weakly to a (d + 1)- dimensional 
Gaussian with variance S^' (t/n). More precisely, define the logarith- 
mic moment generating function 



A,{\) =logEexp 



-=\-(X,-X t ) 



(4.11) 



for A G M. . Then there exist a function A i— ► /24(A) G R.+, such that 



L ij 



< 



/24(A) 

\/n 



(4.12) 
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The proof is quite straightforward and will be outlined in App. [X] Here we 
limit ourselves to a few comments. 

Notice that the statements collected in the above proposition are not all in- 
dependent. Equation J4. lOi . may for instance be regarded as a consequence of 
Eq. J4.12J . The various results are presented in order of increasing sharpness. 
Also, not all of the assumptions in the points 1-3 are needed to proof each of the 
statements in the proposition. For instance, the concentration result is an easy 
consequence of the Hoeffding-Azuma inequality and requires the hypotheses 1 
(uniformly bounded jumps), 3 (scaling of time with n) plus some Lipschitz prop- 
erty of the drift coefficients p'> (x), cf. Eq. J4.4J . This point is further discussed 
in App. [A] The limitation to a deterministic initial condition is easily removed. In 
iterative decoding applications the initial condition is a Gaussian distribution with 
standard deviation of order y/n. Convolution with such a distribution amounts to 
integrating equation J4.7I > and taking as initial condition the initial covariance. Fi- 
nally, the situation investigated here can be regarded as a discrete analogous of 
the Friedlin-Wentzell theory of random perturbations of dynamical systems [9]. 

In the following section we shall apply the above analysis to two LDPC 
ensembles: the standard regular ensemble LDPC(«,Jc 1_1 ,jc r_1 ), and the regular 
Poisson ensemble LDPC(«,Jc 1_1 ,r). The general strategy is the following: (/) De- 
termine a sufficient statistics forthe decoding process. For a general LDPC(n, A, p)| 
ensemble, a sufficient statistics is provided by the degree distributions at variable 
and check nodes in the residual graph. As we will see, a more compact repre- 
sentation is available for the two special cases mentioned above, (if) Write the 
transition probability for iterative decoding and compute the drift and diffusion 
coefficients, cf. Eqs. J4.4I >. J4.5b . (Hi) Determine the initial condition, namely the 
average state, and its variance before the decoding process has been started, (iv) 
Integrate the density evolution and covariance evolution equation, cf. Eq. ( 14. 6> 
and ( 14.71 up to the critical point. The parameter a in Lemma lXTl is finally given 
(up to a rescaling) by the standard deviation of the number of degree one check 
nodes s at the critical point. More precisely: 

\ -l 

(4.13) 

both factors being evaluated at the critical point. 

4.2. Regular Ensembles. We will now show the explicit computations that 
need to be done in order to accomplish the program outlined in the previous sec- 
tion for the case of regular standard and Poisson ensembles. 

There are some significant simplifications that arise in this case. Note that 
the triple (v,s,t) constitutes a sufficient statistics, i.e., it suffices to keep track of 
the number of variable nodes (all of which have degree 1 since by assumption the 
graph is regular), the number of degree-one check nodes and the number of check 
nodes of degree two or higher. This can be seen as follows. We claim that all 
constellations of "type" (v,s,t) have uniform probability. To see this let Gi and 62 
be two residual graphs of type (v,s,t). Assume that Gi is the result of applying 
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the iterative decoder to the graph Gj with a particular channel realization and a 
particular sequence of choices of the iterative decoder. It is then easy to see that 
there exists a graph G2 which differs from Gj only on the residual part (where it 
coincides with G2) but agrees with it otherwise. By definition of the ensemble, 
Gj and G2 have equal probability and if the iterative decoder is applied to G2 with 
the same channel realization and sequence of random choices we get 62. This 
shows that 61 and G2 (and therefore any residual graph which is compatible with 
the degree distribution) have equal probability. It follows that, given (v,s,t), the 
distribution of G is determined so that (v,s,t) indeed constitutes a state. 

Let us now determine the degree distribution of a "typical" element G of 
type (v, s,t), since this knowledge will be required in the sequel. For the standard 
ensemble define the generator polynomial p(z) := (1 + z) r — rz — 1 which counts 
the number of connections into a check node of degree two or higher. For the 
Poisson ensemble the equivalent function is p(z) := e~ — z — 1. Define a(z) := 
Z-E44. The total number of constellations on t check nodes of degree at least two 

with vl — s edges is easily seen to be coef {p(xY,x rl ~ s }. Let f,-, i > 2, denote the 
number of check nodes of degree i. Then the total number of constellations which 
are compatible with the desired type can be written as 



E 

'2,*3>-:Ei>2»i=»;Ej>2*i=Vl-i 

Since all constellations have equal probability a "typical" constellation will have 
the type which "dominates" the above sum. Some calculus reveals that this dom- 
inating type has the form 

V = ^-r, i>2, (4.14) 

where r,-, i > 2, denotes the fraction of check nodes of degree i and where a(z) = 

vl — a 

T 

We will see shortly that for the Poisson case it suffices to consider ensem- 
bles of rate zero since the scaling parameters for the general case can be easily 
connected to this case. Therefore in the next theorem we can assume without loss 
of generality that the rate is zero for Poisson ensembles. 

Lemma 4.1. [Drift, Variance and Partial Derivatives for Regular Ensem- 
bles] Consider regular standard ensembles LDPC(n,Jt 1 ~ 1 ,x r ~ 1 ) or regular Pois- 
son ensembles LDPC(«,x 1_1 ,r = 0). Define 

. . I (1 +z) r — 1 — tz, standard ensemble, 

Pv-i = \ 

I e z — 1 — z, Poisson ensemble, 

and let a(z) '■= z^-pr- Let xi denote the right-to-Ze/f erasure probability and let 
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e\(xi). Then along the density evolution path parametrized by x/ we have 

; (r) =-(i-i)— , /m = _i_ (1 _i)£__/m 




da v\ a'(z)p{z) 



f [aT) 


-t'X-^V 


a/w 


1 - 1 a a/ (r) 


dv 


t/1 ^ dv 


df^ 


1 - 1 3/M 



8t ' 3cr ^1 

, 2 



/ W =-Vf-d-l)(fl)f/ W (l+2^ 
1 — 1 Vfl / i/l ^1 

3/M 2(1-1) / r 2 p 2 zl(2-o(z)) 



dv vl \ v a'{z)p{z) 

a/M 2(1-1) /r 2 P22»(z) (2 -a(z)) 



3r z/1 y r a'(z)p{z) 

where for the standard regular ensemble z = iL e \(x ) wnereas f° r me Poisson 
regular ensemble z = e ffi . 

Proof. Let er denote the fraction of degree-one check nodes, r,-, / > 2, the 
fraction of degree-/ check nodes and v denote the fraction of residual variable 
nodes. Since the total edge count on the left and right must match up we have 
a ' + ]£; = 2*Ti = vl. A random edge therefore has probability q\ := ^- of being 
connected to a degree-one check node and probability qi := ^ of being connected 
to a degree-/ check node, / > 2. For large «, the joint probability distribution of 
all 1 edges emanating from a variable node converges to the product distribution. 
It follows that (in this large blocklength limit) the probability distribution (for a 
randomly chosen variable node) of having u\ connections into degree-one check 
nodes and u 2 connections into degree-two check nodes is given by 

w{u u u 2 ):=( / W^a-Sl-ft) 1 "" 1 "" 2 . 

\ui,u2,l — u\ — u 2 J 

In the iterative decoding process variables are not picked at random though. A 
variable node is picked with a probability which is proportional to u\. Therefore, 
the induced probability distribution under iterative decoding is 

w(wi,k 2 )»i _, . m .. .,. 

w(wi,w 2 ) = = — - — 7 — T =w(u u u 2 )- — , (4.15) 

!«;,«:, n«i,"2K Mi 

Note that the generating function of w(u,v) has the compact description 



W(x,y):= Y, H , (wi,«2)*" 1 ;y" 2 =x(xqi+yq2+0--qi-q2)) 



1-1 
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In terms of W(x,y) we have 



/ = - £ w{m,U2)U2 = - 
= -(l-l)q 2 = -(1-1 



3W(x,y) 



?y 



x=y=l 



2r 2 



f{") = - £ W(U1,U2)(U1-U2) 



dW(x,y) 



dx 



-P 



v=v=l 



-l-(l-lk 1 -/M = -l-(l-l)--/W, 

J/1 



/(-)= £w(u u u 2 )u 2 2 - (f 



Ul,U2 



d 2 W(x,y) 



dy 2 



v=v=l 



./m_(/m) : 



= (1- l)(l-2)^_/M _ f/M) 2 = _/M ( i 



i-i r 



/(-) = £ W ( Ml )M2 )( M1 - u 2 )u 2 - /<->/« = ^M 

si dx y 



(tt) _ f(r)Y _ f( 



x=y= 1 



f {TT> -(f [T> -f {a, f 



= -fr>(l + (1 - 2) qi ) -f^> - (/^ ) -pr)fr) = f^ | 

/(-)= 2>(« IjM2 )( Ml -« 2 ) 2 -(/W) 
3 2 W(x,y) 



1-1 



dx 2 



r(ar) _ r(cr) r(r) _ r( T r) _ I ?(r) \ _ r(cr) _ S(t) _ I jr 



.v=v=l 



(l - 1)91 (2 + (l - 2) qi ) - fir*) -f^f^ - f [TT > - 

f(r)) 2 

1-1 v '\i/l ) v\ J \ v\) 



Next we need to determine the partial derivatives. From equation (I4.14> for 
/ = 2 we have 

3/< r ) _ 2(1-1) dn _ 2(1-1) (t 2 d P2^ dz 
dr v\ dr v~L 1 r dz dr 

2(1-1) (n p 2 za(z)(2-a(z)Y 



v\ 



a'(z)p(z) 



The remaining derivatives follow in the same way and we skip the details. Now 

note that along the typical decoding trajectory all quantities required to compute 

the above expressions are given by the density evolution equations i l2.lt and J2.2b . 

It remains to establish the link between z and x/. We start with standard 
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ensembles. From the density evolution equation ( I2.2I > 
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Tl 



©*?(!-*)■ 



— 2 



G)(t^) : 



/>2 



l-x r 



E^G)4d-*r' e^o^)' K^ 



Comparing this to equation (I4.14> it follows that z = ^- = -r— x^r- 

Recall that in the Poisson case we can assume that r = 0, so that p(x) 

x-\ 

R(x) = eT% . Again from 12.21 

2 



- 2 = ^W^(i-^) = ^V 



2(/A) 2 



"ifr 



from which it follows that for the Poisson case 

e\(x,) 



/A 



Figure^] depicts p aa \ p aT ^ and p T ^ as a function of v along the critical 
trajectory (i.e., for the choice e = e* w 0.4294) for the LDPC^x^x 5 ) ensemble. 
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Fig. 12: The evolution of f^ (dashed line), /( CTT ) (dotted line) and /< rT ) (solid line) 
along the critical trajectory for the LDPC(n,x ,-v ) ensemble. 



The last piece of information required to apply the strategy outlined in the 
previous subsection, consists in determining the initial condition for the density 
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and covariance evolution. This is provided by the following lemmas, whose proof 
are fairly routine and therefore left to the reader. 

Lemma 4.2. [Initial Condition for Standard Regular Ensembles] Consider 
transmission over the channel BEC (« , ne) using a random element of LDPC (« , x 1 ^ l , x r ~ ! ) | 
Consider the residual graph (after reception of the transmitted word) and let P; n ; t (s, ?)| 
denote the distribution of check nodes of degree one and of degree at least two, 
respectively. Then 

Pina(j,O=Pa™(*.O(l + 0(l/»))> 
where PGauss(*,f ) is a (discrete) Gaussian density with mean 
1, 



-E[s] =le(l-e) 
n 



r-l 
yi -t) 

-E[t] = -(l-(l-e) r -re(l-e 
n r 



,r-n 



and covariance 



-E[s;s]=lel r - i (l-e T - 2 (l+e({T-l)e-l)T)), 
n 

lE[5;/]=-lee r - 1 (l-e r - 2 (l+e((r-l) 2 e-l))), 
n 

I E [ f ; f ] = i!^(l + (r-l)e-e r - 2 (l+e(2r-3 + (r-3)(r-l)e+(r-l) 3 e 2 ))). 
n r 

Lemma 4.3. [Initial Condition for Regular Poisson Ensemble] A statement 
analogous to Lemma l4~2l holds in the case of Poisson ensembles. For r = the 
distribution of s and t is again a (discrete) Gaussian with mean 





-E[s] = le e- le , 
n 




-E[t] = l-e- le -lee- le 
n 


and covariance 




-ffls;s] 
n 


= le e~ le - le(l - le+ l 2 e 2 ) e _: 



-E[r,t] = -le e- lc + le(l + l 2 e 2 ) e' 21 ' , 
n 

-E[t;t] = (1 + le) e- le - (1 + 21e+ l 2 e 2 + l 3 e 3 ) e - 2le . 
w 

Note that, as one would expect, the random variables (s,t) are in general corre- 
lated. 

We can now solve equations (I4.6J and (14.71 . This allows us to track the 
evolution of the probability distribution of s and t as v decreases from ne to 0, 
assuming that the s = plane was not hit earlier. 
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Example 2. [(3,6)-Ensemble] FigurefHlshows the evolution of <jM, flM, 
S^"" 1 for the LDPC(«,x 2 ,x 5 ) ensemble for the choice e = e* ss 0.42944. Notice 
that the variances of 5 and f can actually shrink as the decoding process evolves. 
This is an effect of the term in square brackets in equation J4.7J . In particular the 
variance shrinks to at v — if e is low enough (whenever decoding is successful 
with high probability). Finally, the parameter a is given by equation I4.13t . where 
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Fig. 13: The evolution of5^ ss > (dashed line), S^ st ' (dotted line) and <$'") (solid line) for the 
LDPC(n,x ,x 5 ) ensemble and the choice e = e* ~ 0.42944. 



the first factor can be computed as in equation J3. II . 

In Table ( 14. 2\ we report the values of e* , a, and (3 for a few regular stan- 
dard ensembles. Further explanations concerning the parameter (3 are provided in 
Section[5] 

The computation of the scaling parameters a = a(l,r) and /3 = (3(1, r) for 
the Poisson case are made easier by the following pleasing relationship. 

Lemma 4.4. [Scaling of Erasure Probability for Poisson Ensembles] Con- 
sider transmission over BEC(n,ne) using elements from the regular Poisson en- 
semble LDPC(n,;c ,r). For 1 fixed and (n,r,e) and («',/, e') such that ne = n'e' 
and(l-r)«=(l-r>', 

ElDPC^-M^ '" 6 )] = E LDPC(, 1 ',.x 1 - 1 y)[ P B( G ^' e ')] » 

nE LDPC(n, x i- l ,r)l P b{G,ne)} = «'E LDPC( „/ vT i-i y) [P b (G,n'e')] ■ 

Proof. We start with the statement regarding the block erasure probabil- 
ity. Compare transmission over BEC(«, ne) using elements from LDPC(«, x 1 ^ 1 , r) 
to transmission over BEC(n',nV) using elements from LDPC(n',x 1_1 ,r'). The 
condition ne = n'e' implies that the number of erased bits is the same in both 
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1 


r 


e* 


a 


p/n 


3 


4 


0.6473 


0.260115 


0.593632 


3 


5 


0.5176 


0.263814 


0.616196 


3 


6 


0.4294 


0.249869 


0.616949 


4 


5 


0.6001 


0.241125 


0.571617 


4 


6 


0.5061 


0.246776 


0.574356 


5 


6 


0.5510 


0.228362 


0.559688 


6 


7 


0.5079 


0.280781 


0.547797 


6 


12 


0.3075 


0.170218 


0.506326 



Table 1: Thresholds and scaling parameters for some regular standard ensembles. The 
shift parameter is given as /3/D where Q is the universal constant stated in equation 
i5. 161 whose numerical value is very close to 1. 



cases. Decoding fails if these erased bits contain a stopping set. The condition 
(1 — r)n = (1 — r')«' implies that the two ensembles have the same number of 
check nodes. Together with the fact that 1 is the same in both cases (and therefore 
the involved number of edges is the same) this shows that the erasure probability 
is the same. 

The proof regarding the bit erasure probability is almost identical. Both 
decoders get stuck in identical constellations. The factor n takes into account 
what fraction of the overall codeword this constellation is. □ 

If we combine the above relationship with the general form of the scaling 
law, cf. equations J1.2t and (II .31 as well as Lemma l3~T1 we get the following 
scaling relations. 

Lemma 4.5. [Scaling of Scaling Parameters] Consider transmission over 
BEC (n , ne) using elements of the Poisson ensemble LDPC (n , x 1 ^ ' , r) with thresh- 
old e* (l, r). Assume that the scaling 11.31 holds and let a(l,r) and (3(1, r) denote 
the corresponding variance and shift parameters. Then 



e*(l/) = e*(l,r) 
a(l/)=a(l,r) 



l-r' 

1^7 



l-r' 
l-r 



/\ 1/2 



(3(l,/)=(3(l,r)^j^j 



1/3 



(4.16) 
(4.17) 

(4.18) 



Proof. The proof is elementary and we leave it to the reader. We note that in 
order to prove ( 14. 16i and ( 14.171 only the simplified form of the scaling law J1.2> 
is required as hypothesis and that this scaling law is proved in Lemma lXTl D 

From the above observations it follows that we have to determine the param- 
eters e* (1, r), ce(l,r) and (3(1, r) only for one rate r. This is the reason why so far 
we have only considered Poisson ensembles of zero rate. Our results will depend 
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1 


e* 


a 


p/n 


3 


0.818469 


0.497867 


0.964528 


4 


0.772280 


0.409321 


0.827849 


5 


0.701780 


0.375892 


0.760593 


6 


0.637081 


0.354574 


0.713490 


7 


0.581775 


0.337788 


0.676647 


8 


0.534997 


0.323501 


0.646335 


9 


0.495255 


0.310948 


0.620646 


10 


0.461197 


0.299739 


0.598429 



Table 2: Thresholds and scaling parameters for some Poisson ensembles 
LDPC{n,£ ,r). Note that these parameters assume that r = 0. Parameters for a 
generic rate can be obtained from these parameters through equations. J4.16l - i4.18i . The 
shift parameter is given as /3/Q where fi is the universal constant stated in \5.16\ whose 
numerical value is very close to 1. 



only on 1. Relations ( 14. 16b - J4718l can be used to reintroduce the dependence 
upon r. 

5. Computation of the Shift Parameter. In this section we explain in greater] 
detail the arguments for Coniecture l3.ll and the procedure for computing the shift 
parameter (3. As in the previous section, we shall first discuss this issue in an 
abstract setting, cf. Section 15.11 The general procedure will then be applied to 
regular standard and Poisson ensembles in Section l5~2l 

5.1. The General Approach. Let us reconsider the setting of Section |4~T1 
i.e., a family of Markov chains X„fi,X„ t i ,.. . ,X„ tt , . . . taking values in Z, d+1 and 
parametrized by the (large) integer «. As before we will drop in the sequel the 
subscript n to mitigate the notational burden. Throughout this section we shall 
assume the hypotheses of Proposition ^. II to be fulfilled. Unlike in Section |4~T1 
we are interested in paths X' Q = {Xq,X\ , . . . ,X t } which are confined to the 'half 
space' : 



{x = (x 



«» 



M 



)e 



fd+i 



,-«» 



>0}. 



(5.1) 



XQ G 



(5.2) 



The 



We would like to estimate the 'survival' probability 

P, = P(X' C H+) . 

Notice that P, depends implicitly on the initial condition Xq 
coordinate X t should be thought as (an abstraction of) the number s of degree- 
one check nodes in the analysis of iterative decoding, cf. Section|3] The survival 
probability P t is therefore the probability of not having encountered a stopping 
set after t steps of the decoding process. We are interested in a time window 
of length 0(n). Without loss of generality we may fix r max > and consider 
t 6 {0, . . . ,fmax} with f max = L«r max J . 



32 Finite-Length Scaling 

We shall denote by z(r) the 'critical trajectory', i.e. a solution of the density 
evolution equations (14. 6> . such that v ' (r* ) = 0, and v ' (r) > for any r <E 
[0,r ma x], t ^ t*. We call zo = 1(0) the corresponding initial condition. In order 
to make contact with the application to iterative decoding, we shall make the 
following assumptions. 

A. As « — >°°, we have xo = nzo + \/nz\ +0(1), with zi € M. d+l independent 
of n. This corresponds to the erasure probability e being in the critical 
window e* - e = 0{n~ y l 2 ). 

B. Let Zu (t), u £ M. d+l , be a 'perturbed' critical trajectory obtained by solv- 
ing the density evolution equations J4.6I > with initial condition z u (t* ) = 
z(t*) + u. As for the critical trajectory, we consider this solution in the 
interval [0,T max ] and take u such that \u\ < e with e small enough. We 
assume that there exist a positive M-independent constant k\, and a func- 
tion u \- * a(u) such that 

^ 0) (r)-^ 0) (r*)>fl(«)(r-T*) + Ki(r-y) 2 

for any re [0,T max ]. 

C. We finally assume that a(u) can be chosen in such a way that \a(u)\ < 
K2\u\ for some positive constant m. 

Notice that the assumptions B and C above can be easily checked on the 'con- 
tinuum' transition rates VK(zi|z) introduced in Sec. 14.11 The situation considered 
here mimics the one found in iterative decoding of unconditionally stable ensem- 
bles. 

Consider the survival probability P tma at the 'latest' time. As we have seen 
in Section |4~T1 most of the trajectories X'™ x are concentrated within ^fn around 

riz(t/n). Therefore the absolute minimum of X, in the interval {0,...,f max } 
will be realized for a t 'close' to nr*. If this absolute minimum is positive, the 
corresponding trajectory contributes to Pf max , otherwise it does not. 
In order to formalize this argument, fix t* = \nr*\ . Then 

*U = E p (*o ,,ax C H+ \Xr = x) P(X t * = x) . (5.3) 

xeM+ 

Thanks to Proposition ^. II we can accurately estimate the factor P(X t * = x). The 
term P(Xq" xx C M + \X t * = x) is the probability that the global minimum of xj-°\ 
t € {0. . .fmax}, is positive conditioned on X t * = x. Let us denote by f g a 'time' 
for which the global minimum is realized. More precisely, f g € {0...f m ax} is 

a random variable such that X, ti < X r for all t € {0. . .f ma x}- Call zx(r) the 
perturbed critical trajectory defined above with perturbation vector u = X t * jn — 
z(t*). In other words, we perturbe the critical trajectory by an 0(l/y/n) amount 
in order to match it to the particular (finite «) realization of the Markov process we 
are dealing with within the critical region. Concentration arguments, analogous 
to the ones used to prove the point I of Proposition ^. II imply that, for a given t: 

P{\X,-nzx(t/n)\>d^/\t^t r \}<Q 1 e- a2s2 , 
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Fig. 14: A pictorial view of decoding trajectories near the critical point. The type of 
trajectory depicted here is responsible for the shift appearing in the refined scaling form 

(HI . 



for some positive constants Q\ and J?2 (as before we use this symbols to denote 
generic constants which are proven to exist independent of «). In fact a stronger 
condition holds true: by Doob's maximal inequality [16, p. 227], for T fixed 



pi max \X,-nzx(t/n)\>dVf\ <^ie- n2s2 

[\t-t*\<T J 



(5.4) 



for some (possibly different) constants Q\ and j?2- Using this fact we can prove 
an useful result: 

LEMMA 5.1. Assume the same hypotheses as in Lemma |4~T1 plus A, B and 
C above. Let t g be a time at which the absolute minimum of X, is realized, for 
t £ {0. . .fmax}- Then there exist positive constants Q\, i?2 and 8q, and a function 
no(5) such that, for any 6 > 5q and n > «o(<5) 

P{\t g -t*\ < S 2 ^n 2 /\ XJ® > 4 0) - <5 4 /V/ 3 } > 1- Q x exp[-Q 2 6 2 } . (5.5) 

The proof is deferred to Appendix Icl The content of this lemma is illustrated in 
Fig.d 

The above result implies that corrections to the simplified scaling of Lemma 
13.1 l ean be estimated through a two step procedure. In a nutshell: (/) Compute the 
probability for Xi to be of order n l ' 3 ; (ii) Evaluate the probability for X, to be 
positive, conditioned on a given X,, of order n 1 ' 3 . 

5.1.1. Distribution of X t *. The simplified scaling form, cf. Lemma |3~T1 
was obtained by approximating the first factor in equation (15.31 by 1 . The leading 
correction to this approximation comes from trajectories such thatX f » = 0(n 1 ' 3 ). 
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Because of Proposition ^, ll the probability distribution of Xl (second factor) is 
well approximated by a Gaussian with center at 0(,/n) and variance of order n. 
The probability of having Xi =0(« 1 ' 3 ) is therefore of order «'' 3 -« _ l ' 2 = n~ 1 > 6 . 
This explains why the correction term in the refined scaling form Jl .31 is of order 
n-V«s. 

This argument can be made more precise by rewriting equation J5.3I as 

*U = P(4 0) > 0) - E P(4 0) < 0|S* = x) P(X t , = x) . (5.6) 

JC6H+ 

The first term corresponds to the simplified scaling form. We shall hereafter focus 
on the second one, P con = P(X, ( » 0) > 0) - P tmm . Notice that P(X r ( g 0) < 0\X t * = x) 

varies much more rapidly (on a scale of order n 1 ' 3 ) in x' ' than in the other co- 
ordinates (on a scale of order n). It is therefore useful to introduce the notation 
x = (.r 1 ' . . .x' f/ ') (and analogously X and z) which distinguish explicitly the last d 
coordinates of x. Since P(X t * =x) varies on a scale n 1 ' 2 , we can safely approxi- 
mate it by setting the coordinate jc*- ' to 0: 

^corr = E| E P(4 0) < I^-(^ (0) ^))1P(^ = (0,X))(1+0(«- 1 / 6 )). 
2 L(o)>o J 

The term in curly brackets depends on x only through the transition coefficients in 
a neighborhood of x and varies therefore on a scale of order «. This point will be 
discussed in detail in the next section. On the contrary P(X t * = (0,x)) is peaked 
around nz(t*/n) with a width of order y/n. Therefore 

Peon- = £ P (X t f } < 0\X t * = (x(°),«z(r*))) P (4 0) = 0) (1 +0(«- 1 / 6 )) , 

j:(°)>0 

(5.7) 

where we recall that ?(r*) denotes the last rf coordinates of the critical point. 
The second factor can be evaluated easily using density and covariance evolution. 
Let us consider the application to iterative decoding (here X^ = s). Note that 
at the critical point and within the critical window X^ is Gaussian with mean 
^p(e — e*)n and variance S aa n. We therefore have 

P K , -)^ ¥ 7^-{-^ ll! }< 1+0 «"-" 2 »»- 

This formula can indeed be guessed without any computation at all. The proba- 
bility of X t , = must be in fact proportional to the derivative of the probability 
of having X* < 0, which is given by equation J 13 within the critical window. 

5.1.2. Distribution of the Global Minimum. We are left with the task of 
estimating the first factor in equation ( 15.71 . and more generally the probability 
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,(()) 



distribution of X} conditioned on X t *. Lemma l5~T1 is. once again, quite helpful. 
The difference \t g —t*\ is small on the scale n on which the transition rates are 
state-dependent. This suggests that the leading correction to the simplified scaling 
depends on the transition rates only through their behavior at the critical point 
z(t*). On the other hand, |f g — 1*\ is large on the scale 0(1) of a single step. 
We can therefore hope to compute the leading correction within a 'continuum' 
approach. 

More precisely, define the rescaled trajectory u(-) £ R^ 1 by taking 

u^{n- 2 l\t-t*))^n- l '^\ (5.8) 

M «(«- 2 / 3 (f - f)) = «- 2/3 (x/' } -XJ9)i = 1,— ,d, (5.9) 

for integers t such that \t — 1*\ < #max« 2 , and interpolating linearly among these 
points. A textbook result in the theory of stochastic processes [22] implies the 
following lemma. 

LEMMA5.2. LetX be distributed as above under the condition X t * = (n 1 ' 3 ^"?! 1 "*))-! 
The process u(-) defined in equations J5.8t and d5.9l > converges as n — > °° to a dif- 
fusion process with generator: 

I f ( 00 >i! ( 5io) 

conditioned on w-°'(0) = C> an d 2(0) = 0. In the above formula we used the 
notation 




/i')=/(') (z(r *)) 7 /,"» =/('%(,-*)), w ; 



dzt 



!(r*) 



In order not to burden the presentation, the proof of this statement is postponed 
to App. |D] Notice that the only role of #max in the above lemma is to assure that 
u(8) stays within a finite neighborhood of m(0) with high probability. We want 
to use the process u(ff) in order to compute the second factor in equation ( 15. 7> 
and therefore the distribution of the absolute minimum of u(9). Let us call 6 Z the 
location of the minimum. Lemma l5~ll implies that |6* g | < <5 4 ' 3 with probability at 
least 1 — £2\ exp(— i?2<5 2 )- We can therefore safely let #max ^^ °° and consider the 
diffusion process defined above for £ (— °°, +■»). 

Notice that only the first derivative with respect to the coordinates w- 1 ',..., w d '^ 
appears in equation (15. 10i . The process u(9) is therefore deterministic: w"(ff) = 

ft for i = l,...,d. We can substitute this behavior in equation J5 . 1 Ob and de- 
duce that m' ' (0) is a time-dependent diffusion process with generator 

^-(i^VU+i^W <5J1) 
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It is convenient to rescale i/ ' and 9 in order to reduce the above generator to a 
standard form: 

(d \ 2 / 3 (d \V3 

e = (/i 00) )- 1/3 fL-,7i ) *, w = (/i °))- 2 /Mga;r/®j ^.12) 

The generator for w(8) has now the form (we keep the same name with an abuse 
of notation) 

L (e) = -6^- + l^. (5.13) 

aw 2 dw z 

A little thought shows that this is equivalent to saying that w(6) = w(0) +9/2 + 
B(8) with B(8) a two-sided standard Brownian motion with B(Q) = 0. The prob- 
lem of computing the distribution of the global minimum of such a process has 
been solved in [10]. Adapting the results of this paper we find 

P (w(0 g ) - w(0) < -z) = 1 - K{zf , (5.14) 

where 

= 1 J Ai(*>)Bi(2'/ 3 z + jy) - Ai(2'/ 3 z + ; »Bi( ; » 
2 J Ai(;» 

with Ai(-) and Bi(-) the Airy functions defined in [1]. 
Putting everything together we get our final result 

I P (4 0) < 0|*. = (xW,nz(t*/n))) = n^O (/ i° 0) ) 2 / 3 (t^jA (1 +«(!)), 
*(°)>o \ i=i I 

with 

fi= / [l-K{z) 2 }dz. (5.16) 

Jo 

A numerical computation yields ft = 1.00(1). 

5.2. Application to Regular Standard and Poisson Ensembles. There is 
one important difficulty in applying the general scheme explained above to iter- 
ative decoding: the Markov process is not defined for s < 0. Recall that s corre- 
sponds, in this context, to the 'critical' variable X, . On the other hand, both the 
drift and diffusion coefficients p'\-) and f^'(-) can be continued analytically 
through the s = plane. Since the final result ( 15.161 1 depends on the transition 
rates only through these quantities, we are quite confident that it remains correct 
also for iterative decoding applications. 

Conjecture 5.1. [Shift Parameter for Regular Standard Ensembles] Con- 
sider the regular standard ensemble LDPC(«,x 1_1 ,x r_1 ) or the regular Poisson 
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ensemble LDPC^n,* 1 l ,r). Then 

/3/i7 = „(y(^))2/3 



9/ W , df^ fM 
du dr 



1-V3 



(5.17) 



For the regular standard ensemble LDPC (n , x 1 ' , x r 1 ) define 



?(*) 



IL2 ( f)^"'"'' 



ft(*) = (l-l) 



2(2)^ 



If=2 G)*'>-'* 



Then 



/?/rt = 



oVr 

Ik 



-1 



2\ 2/3 A'(z)g(z)-1/J'(z)\ _1/3 



1-1 



Tg'(z) 



where z — ex and all parameters are taken at the critical point. 



The generic equation \5.\1\ follows directly from equation (15. 161 . applied 
to the iterative decoding setting. For regular standard ensembles these expres- 
sions can be made somewhat more explicit. First we note that at the critical point 
p a ' = — 1 — f^' since with probability approaching one (as « tends to infinity) 
the variable node which is pealed off has (only) one check node of degree one at- 
tached to it. 6 Since p a > = at the critical point it follows that p T > = — 1. Using 
again the relationship p a > = — 1 — /■' some calculations show that p <7 ' J > = i=| 



and that 



3/ (o0 



and 



3/(° 



can be expressed as indicated. 
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APPENDIX 

A. Covariance Evolution for a General Markov Process. In this Section 
we reconsider the abstract setting of Section l4~7l and outline a proof of Proposition 
14. II under the assumptions 1-3. 

Proof. We start with statement I, whose proof is fairly standard. Define a 
Doob's Martingale Xq, . . . ,X t , 

X s =E[X t ( - i) \X ,...X s ]. 

Note that X, = X r (0 and X = E[Z f (0 ] = xf so that 

P{|Z«-zf ) |>p}=P{|Z,-Z |>p}. 
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Therefore, by the Hoeffding-Azuma inequality we will have proven J4.8I if we 
can show that Xq, . . . ,X t has bounded differences, more specifically, if we can 
show that 



P*G-X,-i| < y/Qo, i< s <t. 
To accomplish this task note that 

\X s -X s ^\<sup\E[X t {,) \X Q ...X s -uX s =y}-E[X^\X ...X s ^ 7 X s =z\\, (A.l) 
y,z 

where the sup is taken over all the y and z such that the trajectories {Xq, . . .X s -i,X s =| 
y} and {Xq, . . .X s -i,X s = z} have non-vanishing probability. Consider therefore 
two realizations of the Markov chain which coincide up to time s — 1 but are inde- 
pendent afterwards. Denote them by Xq,X\,. . . and Yq, Ft,. . ., respectively, where 
by our assumption X T = Y T for < r < s — 1, but the processes evolve indepen- 
dent forr > s. Since by assumption \X S . — X_j| < k\ and \Y} 1 ' — F_jJ < «i al- 
most surely it follows that |XJ — Y} \ < 1k\ almost surely. Define 5X T =X T — Y T 
and SX T =X T — Y T . Then we have for s < r < t 

dX%, < SX { Sn\/ i] (Xr)-f (i) (Yr)\} < 

n n 

Here we approximated f {i) {X T ) - f^{Y T ) by f^(X T /n) - f {i) (Y T /n) and then 
used the fact that p'> (z) has bounded derivative. By Gronwall's Lemma we now 

get \x\ l) -F f W | < ^/Th for some suitable constant (2 . Since X f W = E[X t {i) \X Q . . .X,_ U X S 
y] for some particular choice of y (and some fixed "past" Xq...X s -\) and the 

equivalent statement is true for Y t it follows from iAA\ that \X S — X s -i\ < \JT2q~. 
Notice that equation J4.8I implies 

E\X, -X,\ p < a p {n t) p/2 , (A.2) 

for some 7 positive constants a p . Before passing to the following parts of the 
Proposition, let us notice that not all the assumptions on the transition rates W(Z\|z)| 
were used here. It is in fact sufficient to assume that the drifts p'> (z) are Lipschitz 
continuous. 

Let us now consider the point II. A simple computation shows that 

Ex/^ = EX t (<) + E/W (X, ), (A3) 

E[X t %X t ( l\] = E[X t {i) ;X t U) } +E/W)&) + (A.4) 

+E[xP;fW(X t )} +E[f^(X t );X t U) ]+E[f^(X t );f^(X t )] . 



7 One has in fact a f , = p y/n/2 E|Z| P with Z a standard Gaussian variable. 
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Consider the first of these equations and notice that, approximating /W (X t ) by 
f(''(X t /n) one obtains 



\X^ -X r W -p(X t /n)\ <- + \E[f®(X t /n)-P(X t /n) 



(A.5) 



Since the second derivative of p l > (z) is bounded, we have the estimate 

B 



|E[/W(X,/n)-/W(X f /n)]|< 



i L a/(o 



"T &J 



E^'-rp'] 



X,/n 



nx t -x t \ z < 



< 



c 



Summing equation (IA.5> over f , and applying Gronwall's Lemma we get 



X«-z«(r/») 



< 



A 1 



(A.6) 



Notice that if we limit ourself to assume Lipschitz continuous drift coefficients 
p'i (z), the same derivation yields a slightly weaker result: \X t /n — v 1 ' (t/n) | < 

Equation J4. lOi is proved from iAA\ much in the same way, the crucial input 
being an estimate on M\X t — X t \ 3 , once again obtained from equation J4.8J . Here 
we limit ourselves to sketch how the various terms emerges. We start by rewriting 
equation JA.41 in the form 



.(y) 



i(y) , mar. 






4 



m 3/W 



3-/ 



9/® 



x,/« 



9z/ 



4 



(/» 



X t /n 



*$>+*$ +*$+B§>+aff+*§>, 



With the remainders listed below 
,(0) 



R™ = E{fM(X t )-fM(X t /n)]+E[pJ\x t /n) -f^(X t /n)] , 



(i) 



"(0. f{j) 



R\)>=E[X^;f^{X t )-f^{X t /n)], 

If = E$0;fU(X t /n)-fU(X t /n) - - £ ^ 

« ,Tj oz/ 

tfjf =E[/tt(X,);/^(X,)]. 



Z? 



« (0 -Z? ) )] ! 



X,/n 



Each of this terms can be bounded separately as in the derivation of Eq. (IA.6I 
Consider for instance R t - : 

\R$\ < E^-X^] l l 2 E\f^\x t )-fi\x t /n)-f^{X t )-fi\x t /n)] l l 2 < 

<An ml < c 
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where we used the estimate JA.21 

Let us finally consider part III of the proposition, as stated in equation (I4.12> . 
It is easy to derive the following recursion for the generating function: 



A+i(A) = A t (X) +logW(X/V^\X,) - -=A • (X t -X,) + 



E[W(X/^i\X t )e^ '} 




(A.7) 



log' 

E[W{\/,/n\X t )e^ 

Here we defined the jump generating function 

W{X\x) = 1 £e x - A W(A\x). 

A 



The proof of equation d4. 12t is completed by estimating the various terms in equa- 
tion JA.7> as follows 



io g w(x/V^\x,) - 4= ■ (*'+! - x >) -^-Lf^HxtMXiXj 



< 



n (x) 

n 3/2 



A',-, 



E[(W(X/^i\X t )-W{X/^i\X t ))e^ A '] 



E[W(X/y/n\X,)e^ 
1 d 

" 1=1 



•V,, 



a/w 



dzi 



A?»+4 l)dfU 



X t /n 



ftj 



V. 



< 



a (A) 

„3/2 



We leave to the reader the pleasure of proving these two last (straightforward) 
inequalities. D 

B. Unconditionally Stable Ensembles: Proof of the Scaling Law. In this 
Appendix we prove Lemma ETT1 The idea is to regard iterative decoding as a 
Markov process in the space of states 8 x = (v G ,SG 7 t G ) G 1? . The transition rates 



and the initial condition for such a process are computed in Section I4T21 As in 
Sec. 14.11 we denote by z = x/n = (yG,CTG,T G ) the normalized state and by z(r) 
the critical trajectory. This is the solution of the density evolution equations J4.6I . 
such that z(r en( j) = (0,0,0), corresponding to complete decoding, <t g (t*) = for 
some t* G (0,T em j), and ct g (t) > for any r G (0,T em j), tj^t*. 

It would be tempting to use the general covariance evolution approach pro- 
vided by Proposition 14. II However a simple remark prevents us from following 
this route in the most straightforward fashion. Proposition 14. II was proved un- 
der the assumptions that the transition rates W(zA|z) in the « — » °° limit become 



For the sake of definiteness, we refer here to the case of regular ensembles: the extension to 
general unconditionally stable ensembles being trivial. Also, we use the subscript G for the state 
coordinates in order to distinguish them from the time parameters t and r. 
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C 2 (R' l+l ) functions of z. On the other hand, the decoding process is well defined 
only if s G > 0, and we are interested in trajectories passing close to the s = plane. 
In more concrete terms, Proposition ^. 1 I cannot be true when z (r) is at a distance 
of order 1/yfn from the s G = plane. The least that will happen is that a part of 
the Gaussian density is 'cut away'. 

As a way to overcome this problem, we introduce a new Markov process on 
the same states x = (vg,«g,?g) which is well defined for sq < 0. We extend the 
transition rates computed in the proof of Lemma |4~TI to sq < by setting <tg = 
there. More precisely we have: 

Z\v G = -l, As G = —ui+U2, At G = -U2, (B.l) 

with u\ and U2 distributed according w(u\, 112), see equation (I4.15> . where we put 
q\ = and (72 = 1t2Jv\ and T2 is determined as in (I4.14> . Notice that the only 
non-zero entries of the distribution w{u\,U2) in the s G < space are therefore 

w(l,« 2 )=( 1 M2 1 )<? 2 ' 2 (l-<?2) 1 - 1 -" 2 . 

Such transition rates do not necessarily correspond to any graph process in the 
sq < plane. However, upon conditioning on s G > the 'extended' process co- 
incides with the original one. Therefore the probability of not leaving the s G > 
half-space (the 'survival' probability) can be calculated on the extended process. 
Finally, let us notice that the precise form of this extension is immaterial as long 
as some requirements are met. Call W(Z\|jc) the transition rates of the extended 
Markov process. We require that: 

• The chain makes finite jumps. 

• The rates are well approximated by their continuum counterpart W(Z\|z). 
As in Sec.|4j]this means that |W(Z\|x) -W(A\x/n)\ < n/n. 

• The continuum transition rates are C 2 with bounded derivatives in the 
region {v G > £, og > e, r G > e} for any e > 0. 

• There exist a S > such that the continuum drift coefficients are Lips- 
chitz continuous uniformly in the region Crit(<5) = {z s.t. |z — z(r*)| < 
5}. This means that |/,(z) — fi(z')\ < k'\z — z'\ for some positive k' and 
any pair of points z,z' € Crit(<5). 

These requirements are easily checked on the extension defined above. 

Recall from Lemma ITTI that we are only interested in decoding errors of 
size at least ju^, where v^ := vq{t*) is the critical point (measured in terms of 
the fractional size of the graph) and 7 is any number in (0, 1). In particular 7 is 
non-negative but can be chosen arbitrarily small. For ensembles with A'(0) = 
a simple union bound shows that the decoder will be successful with high prob- 
ability once the residual graph is sufficiently small but if A'(0) > then small 
deficiencies in the graph can contribute non-negligibly to the error probability. 
Therefore, by choosing 7 € (0, 1 ), we "separate out" the contributions to the block 
error probability which stem from large error events. 

Call P cn( j the probability of not hitting the s G = until v G = L n 7^*J ■ Fi x 
Tmax so ma t ^(imax) = ^v* . Define P t to be the survival probability up to time 
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t . It will be useful to denote by P t (x' J) the probability of surviving up to time t 
conditioned on having survived up to time t' and that the state at time t ' is x' . 

In order to apply Proposition 14. II as far as we can, we decompose the time 
up to r max into two intervals: {0, ...,f*} and {f* + 1, . . . ,t max }. The survival 
probability can be written as 

JU = ^P tmm (x,tl)P(x,tl |*o,0) . (B.2) 

X 

Here P(x' ,t'\x,t) denotes the probability of arriving in state x' at time t' without 
hitting the s G = plane, conditined on being in state x at time t. The sum overx 
runs over the s G > half-space. 

Next we chose t*_ = [n(r* — e)\ for some (small) positive number e. With 
this choice the factor P(x,t : L\xo,0) in the above equation can be estimated using 
the covariance evolution approach and Proposition 14. II The reason is that the 
trajectories contributing to this factor stay at a distance of order n from the s G = 
apart from some exponentially rare cases. We leave to the reader the task of 
adapting the proof of Proposition ^. 1I HI to this situation. 

The first factor in equation ( IB.2J can not be estimated through covariance 
evolution. Fortunately a less refined calculation is sufficient in this case. In fact 
the Lipschitz continuity of the drift coefficients ensures that, at any time t > t*_ , 
the state is within 5 of the density evolution prediction with probability at least 
1 — exp[— 5 2 /2fi(t — t * )]. This fact was stressed in the proof of Proposition ^. II 
cf. Appendix El For any state x, consider the solution z(t;x) of the density 
evolution equations J4.6I with initial condition z{t * jn;x) = x/n. Let P tmax (x, t*_ ) = 
0ifz(r;x) intersects the era = plane in the interval [f*/n,T max ] and.Pt max (x,f* ) = 
1 otherwise. The above concentration result implies that P, max (x, t * ) is a good 
approximation for P, max (x, t * ). 

Let us prove the last statement in the cases in which z(r;x) does not intersect 
the Og = plane (and therefore ^r max (x,f* ) = 1). If x is distributed according 
to P(x,t*L\xo,0), the trajectory z{t;x) will stay at a distance of order \j \fn from 
the critical one. In particular, its minimum distance from the er G = plane will 
be "f/\/ri with 7 of order 1. This minimum will be achieved for t close to t* 
with high probability. We therefore restrict ourselves to an interval of times f_ < 
t < t * + nTe for some fixed number T > 1, and neglect the cases in which the ctg 
plane is touched outside this interval. The error implied in substituting P rraax (x, t*_ ) 
with Pr max {x,tl) is upper bounded by the probability that the maximum distance 
between the actual decoding trajectory and z(t;x) in the interval t * < t < t * + nTe 
(t* — e < t < t* + (T — l)e) is larger than jy/n. Using the above concentration 
result with S = "f^/n and t — f* < nTe, we get 

l*L (*. *1 ) - Pt^ (x, tl )| < exp J - ^- J . (B.3) 

As mentioned above, under the distribution P(x,tl |xo,0), both 7 and T are, with 
high probability 0(1) (both with respect to n — * °° and e — <> 0). Therefore the right 
hand side of equation ( IB.3I can be made arbitrarily small by taking e — > 0. 
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The last step consists in substituting P fmax (x, tl ) for P fimK (x, t * ) and the Gaus- 
sian density from covariance evolution for P(x, t*_ \x§ , 0) in equation ( IB.2J and let- 
ting « — > °° with n'' 2 (e — e*) fixed. This yields Lemma IXT1 up to corrections of 
which vanish when e — > 0. 

C. Proof of Lemma 15.11 In this Appendix we present a proof of Lemma 
15.11 making use of Doob's maximal inequality i5A\ . We shall prove that each 
of the two events considered in Eq. d5.5i occurs with probability greater than 
1 — J?i exp[— j?2<$ 2 ]- This implies the thesis by a simple union bound, plus a 
rescaling of the constants fi\, J?2- 

Let us begin by considering the second event, namely X, >X^ —5 4 / 3 n 1 ' 3 . 
For sake of simplicity we redefine t g to be the position of the global minimum of 

X( in the domain t > t* . The minimum with an unrestricted t can be treated by 
putting together the cases t > t* and t < t*. It is also useful to define 



Equation \5A\ implies 

<-SVf J- <f2 ie - n2S \ (C.l) 



P < min 

|^o<r<r 



1 o Kid 
Y t --t 2 + ^=t 
n \/n 



where we rescaled the constants K2 and J?2- 

Let {ti : I E Z} be a non-decreasing sequence of real numbers with f/ — > °° as 
/ — > °° and 1 1 = as / — ► — °°. A union bound yields 

?{mmY t <-5 4 l\ l ^\< ? p{ min T, < -^n 1 / 3 ! < 
t«>0 J ,"„ U<'<'/+i J 

< V p( min [y r -I f 2 + ^ f l<-^/3 n V3_I f 2 + ^ f/+1 l< 

<rt, £ exp(-^2— ^/3„,/3 + I f 2_M M 

where we used Eq. dC.H in the last inequality. At thin point we choose f/ = 
2'(«<5) 2 ' 3 . Plugging into the above expression we get 

pjmin Y t < -W*} < * I^p{-^ ( 1+22 '-^ 2 ' +1 ) 2 



If n > «o(^) := (2«2) 6 i5 2 we get 



P jmin Y t < -^n 1 / 3 j < f?i £ exp J- 



2 /+l 



^ 'l+2 2 '-2' 
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It is an elementary exercise to show that the right hand side is smaller than 
i?iexp{ — fi'28 2 } for some (eventually different) positive parameters Q\ and [2' 2 
and any S > 8q. 

The second part of the proof consists in proving an analogous upper bound 
for the probability of having |f g — 1*\ > 8 2 ' 3 n 2 ' 3 . In fact the proof proceeds as for 
the first event. One splits the semi-infinite interval t > t* in intervals [f/,f/+i [ with 
ti = 2 l (nS) 2 ' 3 and (this time) I > 0, and then apply Doob's maximal inequality to 
each interval. We leave to the reader the pleasure of filling the details. 

D. Convergence to diffusion process. In this Appendix we prove Lemma 
I5.2l as a straightforward application of the following statement which can be found 
in [22]. 

THEOREM D.l. Let {X t } be a Markov process with values in W and tran- 
sition probability ir^ix^dy), with < h < 1 and initial condition Xq = xq. Let Ph 
be the measure induced on the space of continuous trajectories fl = C([0,°°),IBl ) 
by the mapping X(th) = X t for integer t and interpolating linearly in between. 
Assume that the limit 

lim \ [ \<f>(y) - <j>{x)] n h (x,dy) = (L<f>)(x) , (D.l) 

exists uniformly in a compact K C R for functions <fi G C°°{K). Assume that the 
limit has the form 

(^)W4^W^ + t^)|, (D.2) 

with continuous and uniformly bounded coefficients a = {aij(x)} (a being a pos- 
itive definite matrix) and b = {bi(x)}. Assume finally that the solution of the 
martingale problem for Si is unique yielding a Markov family of measures P x on 
Q. Then {P/,,x} converges to {P x } as h — ► 0. 

The proof of Lemma l5~2l proceed then sa follows. Set h = n~ 2 ' 3 and define 
the a Markov chain in the variables uq,u, see Eq. J5.8I . J5.8J using the transition 
rates W(Z\|x) and the initial condition uq(Q) = C- u(0) = 0. One has then just to 
compute the generator 

(L(j))(uo,u) = limn 2/3 V Wuo + n' xl3 A ,u + n' 2 ^A) - f(u ,u)] ■ 

r> — inn "^ 



A n ,A 



W(A ,A\n- 2 / 3 vo,n- l X tt +n- l / 3 u)(P3) 



where made the subsitution W(Z\|x) — > W(A\x/n) which implies a negligible 
0(1 /n) error. The formula d5 . 1 Oi is easily obtained by Taylor expansion the above 
equation. 



